1.3 Business Goals

For our project, we propose ten questions to analyze submissions and comments about student loans on reddit. For our analysis, we will be using a combination of subreddits, including r/cryptocurrency and r/dogecoin.

1.3.1 Exploratory Data Analysis (EDA):

1. How do activity levels (such as post and comment volumes) within Dogecoin discussions on selected subreddits change over time, and what patterns emerge when analyzed by month or hour?

By analyzing Dogecoin discussions across selected subreddits, we observe fluctuating activity levels, with post and comment volumes showing discernible patterns when segmented by month or hour. These trends often align with market movements or significant community events, suggesting a correlation between online engagement and Dogecoin’s market dynamics.

2. Are there dedicated users discussing dogecoin? What do user patterns indicate?

Observing user behavior can help us understand if there are users discussing dogecoin across subreddits. Scores they receive on their posts and comments can potentially indicate their level of acceptance. Additionally, tracking the frequency and timing of these discussions might reveal when the community is most active and engaged with the topic.

3. Is there a noticeable difference in recommendation towards Dogecoin between r/dogecoin and other cryptocurrency-related subreddits, and how do buy signals compare across these platforms?

A comparative analysis of sentiments towards Dogecoin between r/dogecoin and other cryptocurrency-related subreddits reveals distinct perspectives. r/dogecoin generally exhibits more positive sentiment and a higher frequency of buy signals, underscoring the community’s optimism compared to the broader crypto discourse.

1.3.2 Natural Language Processing (NLP):

Analyze the evolution of sentiment surrounding Dogecoin, particularly in response to major events or influencer actions, leveraging financial sentiment analysis models to quantify and track changes in the perceived outlook toward this cryptocurrency.

4. How can analyzing basic text characteristics, like term importance and frequency provide insights into the cryptocurrency subreddit communities?

Basic descriptive analysis of text data provides essential insights. These include language patterns and characteristics of discussions around Dogecoin by examining text length distributions, frequent word usage, and term importance through TF-IDF scoring, comparing the subreddit communities r/CryptoCurrency and r/dogecoin.

5. What are the key themes and areas of focus driving conversations around Dogecoin, as identified by topic modeling?

Uncover the prominent topics and themes driving conversations about Dogecoin within the subreddit communities through topic modeling techniques like Latent Dirichlet Allocation (LDA). Topic modeling on Dogecoin-related discussions uncovers several primary topics, including market predictions, community projects, and technological developments.

6. How does sentiment towards dogecoin evolve in response to real-world events, as expressed in reddit posts?

Employing Spark NLP models specifically designed for financial sentiment analysis, we track sentiment evolution surrounding Dogecoin, especially in response to major events or influencer statements. This enables better understanding of market dynamics - especially since volatile financial markets are particularly susceptible to periods of excitement.

1.3.3 Machine Learning (ML):

7. How can LSTM models be utilized to predict long-term trends in Dogecoin prices based on sentiments expressed in Reddit discussions?

LSTM models are neural networks known for their ability to capture long-term dependencies in time series data, can be applied to forecast Dogecoin price trends by analyzing sentiment trends in Reddit discussions. This approach leverages the temporal patterns of sentiment and its potential predictive power to anticipate future price movements over extended periods.

8. Can time series modeling reveal the relationship and short-term effects between Reddit activity and Dogecoin market behavior - does rising price inspire more buy recommendations online?

Time series models like ARIMA and Vector Autoregression (VAR) can be employed to understand the short-term reciprocal effects between Reddit activity and Dogecoin market behavior. By examining the time-lagged relationships, this method assesses how changes in Reddit discussions might influence market conditions and vice versa. Specifically, it also estimates the impact of a shock/sudden change in price and buy recommendations on their own future values and each other.

9. How can causal inference be applied to evaluate the impact of high-profile endorsements on Dogecoin discussions on Reddit

Using specific libraries for causal inference on observational big data on Apache Spark, we study the effect of Elon Musk’s sudden endorsement of Dogecoin with a tweet in Nov. 2022 on sentiments and buy recommendations present in subreddit discussions.*

10 What does the network of interactions among users discussing Dogecoin on Reddit reveal about the community structure? Do highly engaged and connected users post uniquely different content?

Network analysis of user interactions within Dogecoin discussions can reveals a complex web of community structure, highlighting key influencers and the flow of information. Using dimensionality reduction techniques (t-SNE) on the embeddings generated in the NLP stage, can we see if users who are single-subreddit users post content which is sharply different from highly engaged, multi-subreddit Dogecoin enthusiasts.